Skip to main content

LanceDB

Querying Overview

One way to specify a LanceDB query in Qarbine is to use a JSON-like structure. Below is an example to retrieve up to 10 matches from the quick_setup collection.

{
table: 'my_table',
nearVector: [100, 100],
filter: 'price > 12',
}

LanceDB offers a variety of parameters to control how the searches are performed, how filtering comparison is done and what properties are returned. For details see https://lancedb.github.io/lancedb/guides/tables/ and
https://lancedb.github.io/lancedb/basic/

Prerequisites

Prior to using Qarbine’s embeddings(...) macro function or the SQL-like query function nearText(...), the Qarbine Administrator must first configure “AI Assistant(s)”. The AI Assistants provide access to various popular Generative AI services and are referenced using an alias. Check with your Qarbine administrator for which ones are available and their proper use. For example, when using dynamic query vector embeddings, the model used by the AI Assistant must be compatible with the one used to generate the original embedding values in the database.

Query Specification Options

Primary Options

The primary specification options are described below.

Field Description
tableThe LanceDB table to perform the query upon.
nearVectorThe embedding from which to determine similarity.
nearTextThe value is a string with the similarity phrase. For example “dracula movies”. An embedding value fro the nearText argument will be obtained by Qarbine using a configured Qarbine AI Assistant. When using this option the model used to insert the LanceDB data must correspond to the one used by the Qarbine AI Assistant.
selectThe list of field names to return for each matching element. See https://lancedb.github.io/lancedb/javascript/classes/Query/#select
includeVector A boolean. The includeVector default is false indicating not to return the vectors in the answer set to conserve space.
filterThe SQL-like filtering expression.
nprobes nprobes is only used when the vector column has an IVF PQ index. The number of probes used. A higher number makes search more accurate but also slower. See https://lancedb.github.io/lancedb/ann_indexes/#use-gpu-to-build-vector-index
refineFactorRefine the results by reading extra elements and re-ranking them in memory. A higher number makes search more accurate but also slower. See https://lancedb.github.io/lancedb/ann_indexes/#use-gpu-to-build-vector-index
prefilter V1. A boolean.
postfilterV2 A boolean.
metricType V1. Options are: cosine, dot, l2. The default is l2. (ell2)
distanceTypeV2. Options are: cosine, dot, l2. The default is l2. (ell2)
sortByThe field to sort the answer set by.
offsetIt indicates how many objects of the answer set to skip over as part of the returned answer set.
limitThe maximum number of matches to return.

See
https://lancedb.github.io/lancedb/js/classes/VectorQuery/

Filters

Filters act like SQL WHERE clauses to determine which elements match. An element may be similar to any given vector, but not pass the filter rules. Below is specification using a filter.

{
table: 'my_table',
nearVector: [100, 100],
filter: 'price > 12',
}

Details on LanceDB filtering can be found at https://lancedb.github.io/lancedb/sql/.

Qarbine Enhanced Interaction Options

SQL Oriented Filtering

Recall that LanceDB supports semantic (i.e. vector) search and a lexical (i.e. scalar/matching) search. The use of the specification structure described above can be cumbersome at times. To improve readability and productivity when authoring LanceDB retrievals, Qarbine provides a SQL oriented option. For example, here is an example of a vector search retrieval for the quick_setup collection.

{
table: 'my_table',
nearVector: [100, 100],
filter: 'price > 12',
sortBy: "price",
includeVector: false,
metricType: "L2",
}

The Qarbine SQL equivalent is simply

select * 
from my_table
where nearVector (100, 100)
and withOption("metricType", "L2")
and price > 12
order by price
limit 10

Note that a SQL list is enclosed in parentheses while one in the specification is enclosed in brackets. That is a subtle nuance across the SQL and JSON syntax standards.

Qarbine’s LanceDB integration extends to the filtering features as well. Qarbine is your co-pilot translating SQL-oriented queries into their lower level specification equivalents. In some cases the Qarbine Data Source will have literally just the SQL statement above and nothing more. There are techniques to blend the ease of using SQL along with the powerful features of LanceDB within a Qarbine JSON specification object. The table below lists the fields that drive this definition.

JSON Field Description
sqlThe SQL statement can affect all of the primary options listed above.
sqlWhereThe string can affect all of the primary options listed above except for outputFields and collection.
sortBySqlThe ORDER BY clause specifying how to sort the LanceDB answer set.

Here is a simple example of combining the SQL and query specification approaches. The effective result is the same as the example query specification above.

{
sql: "select * from my_table",
nearVector: [ 100, 100 ],
}

The mapping of the standard SQL clauses to their LanceDB equivalents is described below.

Clause Description
SELECTThe names of the fields to return. Specifying “*” indicates all object fields . Here are some examples.SELECT * …SELECT title, rating_value …SELECT *,vector …Including the named vector field in the SELECT list overrides the default behavior of not including it in the answer set.
FROMThe name of the LanceDB table. This value sets the “table” field in the query specification.
WHERESee the discussion below. The effect is to set the “filter” field of the query specification.
ORDER BYThe sorting rules in “column Asc|desc” format. Sorting is done by Qarbine after LanceDB returns the answer set. This sets the “sortBySql” field of the query specification.
OFFSETIndicates where in the the return objects start return objects. This sets the “offset” field of the query specification.
LIMITIndicates at most how many elements to return. This sets the “limit” field of the query specification.

Bear in mind that some combinations of query fields may not make sense in the LanceDB world.

The WHERE clause criteria can be in a variety of traditional SQL forms and may include Qarbine specific functions descrbied below. For example,

select * from Movies where nearText("dracula")

results in a query specification with these fields,

table: "Movies",

nearText: "dracula"

Some additional Qarbine defined SQL function are listed below.

nearText(aPhrase)
nearVector(number1, number n …)
vector = (number 1, number n ...)A different way of expressing nearVector()
Function Description
nearVectorThis clause is removed from the WHERE criteria and its list of numbers argument set into “nearVector” field of the query specification.
nearTextThis clause is removed from the WHERE criteria and its argument set into “nearText” field of the query specification. The nearText argument can be used by query.nearText(), hybrid.nearText(), or generate.nearText(). Indicate which operation is wanted in the query specification.
withOptionPass in the specification field name and the value to set. This clause is removed from the WHERE clause.
withOptionsSet several specification fields at once. The format is withOptions(key1, value1, keyN, valueN).The key argument may use dot notation when setting the inner value of a component object.
For the following see https://lancedb.github.io/lancedb/sql/#sql-filters
toDateConverts the given date argument intodate "${dateString}
toTimestampConverts the given timestamp argument intotimestamp "${timestampString}
toDecimalConverts the given significantPlaces, decimalPlaces and someNumber arguments intodecimal(${sigPlaces},${decPlaces}) "${someNumber}"
columnIf your column name contains special characters or is a SQL Keyword, you can use backtick (`) to escape it. This function converts the given column name argument into`<columnName>`

Reviewing the Generated Specification

You can enter criteria of the form “EXPLAIN SELECT ….” to have the SQL statement processed and have the returned answer set be the underlying query specification. For example enter and run

explain
select * from my_table
where nearVector(100,100)
and price > 15 and price < 50

Select the single result element and its details are shown to the right.

  

A convenient way of specifying this is to have “explain” on the first line and the rest of your SQL on the next lines.

explain
select * from my_table
where nearVector(100,100)
and price > 15 and price < 50

Then simply “comment out” the first line when not in use

// explain
select * from my_table
where nearVector(100,100)
and price > 15 and price < 50

You can also use “explain: true” in the JSON query specification for similar information.

Another way to get the specification is to press ALT and click   . Below is a sample result.

  

Any “explain SELECT” or “explain: true” takes precedence over the ALT-click interaction.